Learning Subcategorization
نویسنده
چکیده
A method to identify the subcategorized constituents of a verb (its complements) automatically in a sentence is useful in various areas of Natural Language Processing (e.g. automatic acquisition of subcategorization lexicons, parsing, acquisition of verb semantics, information retrieval). I will describe a method for subcategorization identification that uses memorybased learning. Train and test material is extracted from the Penn Treebank II Wall Street Journal Corpus. It makes use of the part-of-speech tags of the words in the sentence and the chunks: base (i.e. non-recursive) noun phrases, adjectival and adverbial phrases. In the training material, each word or chunk has an associated class that describes its relation to the verb; whether it is directly dependent on the verb (the attachment), and if so, what its syntactic category and grammatical function are (e.g. predicative NP, temporal adjunct PP). All the training examples are stored in memory. Then the memory-based learner classifies new test instances by extrapolating from the class of the most similar training example. In doing so, it achieves a generalization accuracy of 88.3% on the word/chunk level, which corresponds to 91% of correctly identified subcategorization frames. LEARNING SUBCATEGORIZATION
منابع مشابه
Learning Probabilistic Subcategorization Preference by Identifying Case Dependencies and Optimal Noun Class Generalization Level
This paper proposes a novel method of learning probabilistic subcategorization preference. In the method, for the purpose of coping with the ambiguities of case dependencies and noun class generalization of argument/adjunct nouns, we introduce a data structure which represents a tuple of independent partial subcategorization frames. Each collocation of a verb and argument/adjunct nouns is assum...
متن کاملMaximum Entropy Model Learning of Subcategorization Preference
Abstract This paper proposes a novel method for learning probabilistic models of subcategorization preference of verbs. Especially, we propose to consider the issues of case dependencie~ and noun class generalization in a uniform way. We adopt the maximum entropy model learn~,g method and apply it to the task of model learning of subcategorization preference. Case dependencies and noun class ge...
متن کاملLearning Verb Subcategorization from Corpora: Counting Frame Subsets
We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents...
متن کاملLearning Probabilistic Subcategorization Preference and its Application to Syntactic Disambiguation
This paper proposes a novel method of learning probabilistic subcategorization preference. In the method, for the purpose of coping with the ambiguities of case dependencies and noun class generalization of argument/adjunct nouns, we introduce a data structure which represents a tuple of independent partial subcategorization frames. Each collocation of a verb and argument/adjunct nouns is assum...
متن کاملAutomatic Extraction of Subcategorization Frames for Czech
We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents...
متن کاملLearning Automatic Acquisition of Subcategorization Frames Using Bayesian Inference and Support Vector Machines
Learning Bayesian Belief Networks (BBN) from corpora and Support Vector Machines (SVM) have been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We are incorporating minimal linguistic resources, i.e. basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007